weighted f1-score
Evaluating Supervised Learning Models for Fraud Detection: A Comparative Study of Classical and Deep Architectures on Imbalanced Transaction Data
Wang, Chao, Nie, Chuanhao, Liu, Yunbo
Fraud detection remains a critical task in high-stakes domains such as finance and e-commerce, where undetected fraudulent transactions can lead to significant economic losses. In this study, we systematically compare the performance of four supervised learning models - Logistic Regression, Random Forest, Light Gradient Boosting Machine (LightGBM), and a Gated Recurrent Unit (GRU) network - on a large-scale, highly imbalanced online transaction dataset. While ensemble methods such as Random Forest and LightGBM demonstrated superior performance in both overall and class-specific metrics, Logistic Regression offered a reliable and interpretable baseline. The GRU model showed strong recall for the minority fraud class, though at the cost of precision, highlighting a trade-off relevant for real-world deployment. Our evaluation emphasizes not only weighted averages but also per-class precision, recall, and F1-scores, providing a nuanced view of each model's effectiveness in detecting rare but consequential fraudulent activity. The findings underscore the importance of choosing models based on the specific risk tolerance and operational needs of fraud detection systems.
- North America > United States > Texas > Harris County > Houston (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Georgia > Fulton County > Atlanta (0.04)
- Law Enforcement & Public Safety > Fraud (1.00)
- Information Technology > Services > e-Commerce Services (0.34)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.97)
- Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.91)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.91)
Improving watermelon (Citrullus lanatus) disease classification with generative artificial intelligence (GenAI)-based synthetic and real-field images via a custom EfficientNetV2-L model
Rai, Nitin, Boyd, Nathan S., Vallad, Gary E., Schumann, Arnold W.
The current advancements in generative artificial intelligence (GenAI) models have paved the way for new possibilities for generating high-resolution synthetic images, thereby offering a promising alternative to traditional image acquisition for training computer vision models in agriculture. In the context of crop disease diagnosis, GenAI models are being used to create synthetic images of various diseases, potentially facilitating model creation and reducing the dependency on resource-intensive in-field data collection. However, limited research has been conducted on evaluating the effectiveness of integrating real with synthetic images to improve disease classification performance. Therefore, this study aims to investigate whether combining a limited number of real images with synthetic images can enhance the prediction accuracy of an EfficientNetV2-L model for classifying watermelon \textit{(Citrullus lanatus)} diseases. The training dataset was divided into five treatments: H0 (only real images), H1 (only synthetic images), H2 (1:1 real-to-synthetic), H3 (1:10 real-to-synthetic), and H4 (H3 + random images to improve variability and model generalization). All treatments were trained using a custom EfficientNetV2-L architecture with enhanced fine-tuning and transfer learning techniques. Models trained on H2, H3, and H4 treatments demonstrated high precision, recall, and F1-score metrics. Additionally, the weighted F1-score increased from 0.65 (on H0) to 1.00 (on H3-H4) signifying that the addition of a small number of real images with a considerable volume of synthetic images improved model performance and generalizability. Overall, this validates the findings that synthetic images alone cannot adequately substitute for real images; instead, both must be used in a hybrid manner to maximize model performance for crop disease classification.
- North America > United States > Texas (0.04)
- North America > United States > California (0.04)
- Africa > Mali (0.04)
- Health & Medicine (1.00)
- Government > Regional Government > North America Government > United States Government (1.00)
- Food & Agriculture > Agriculture (1.00)
Creating and Evaluating Code-Mixed Nepali-English and Telugu-English Datasets for Abusive Language Detection Using Traditional and Deep Learning Models
Pandey, Manish, Yadav, Nageshwar Prasad, Adduru, Mokshada, Rai, Sawan
With the growing presence of multilingual users on social media, detecting abusive language in code-mixed text has become increasingly challenging. Code-mixed communication, where users seamlessly switch between English and their native languages, poses difficulties for traditional abuse detection models, as offensive content may be context-dependent or obscured by linguistic blending. While abusive language detection has been extensively explored for high-resource languages like English and Hindi, low-resource languages such as Telugu and Nepali remain underrepresented, leaving gaps in effective moderation. In this study, we introduce a novel, manually annotated dataset of 2 thousand Telugu-English and 5 Nepali-English code-mixed comments, categorized as abusive and non-abusive, collected from various social media platforms. The dataset undergoes rigorous preprocessing before being evaluated across multiple Machine Learning (ML), Deep Learning (DL), and Large Language Models (LLMs). We experimented with models including Logistic Regression, Random Forest, Support Vector Machines (SVM), Neural Networks (NN), LSTM, CNN, and LLMs, optimizing their performance through hyperparameter tuning, and evaluate it using 10-fold cross-validation and statistical significance testing (t-test). Our findings provide key insights into the challenges of detecting abusive language in code-mixed settings and offer a comparative analysis of computational approaches. This study contributes to advancing NLP for low-resource languages by establishing benchmarks for abusive language detection in Telugu-English and Nepali-English code-mixed text. The dataset and insights can aid in the development of more robust moderation strategies for multilingual social media environments.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- Asia > Nepal (0.04)
- Asia > India (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
Online Social Support Detection in Spanish Social Media Texts
Tash, Moein Shahiki, Ramos, Luis, Ahani, Zahra, Monroy, Raul, kolesnikova, Olga, Calvo, Hiram, Sidorov, Grigori
The advent of social media has transformed communication, enabling individuals to share their experiences, seek support, and participate in diverse discussions. While extensive research has focused on identifying harmful content like hate speech, the recognition and promotion of positive and supportive interactions remain largely unexplored. This study proposes an innovative approach to detecting online social support in Spanish-language social media texts. We introduce the first annotated dataset specifically created for this task, comprising 3,189 YouTube comments classified as supportive or non-supportive. To address data imbalance, we employed GPT-4o to generate paraphrased comments and create a balanced dataset. We then evaluated social support classification using traditional machine learning models, deep learning architectures, and transformer-based models, including GPT-4o, but only on the unbalanced dataset. Subsequently, we utilized a transformer model to compare the performance between the balanced and unbalanced datasets. Our findings indicate that the balanced dataset yielded improved results for Task 2 (Individual and Group) and Task 3 (Nation, Other, LGBTQ, Black Community, Women, Religion), whereas GPT-4o performed best for Task 1 (Social Support and Non-Support). This study highlights the significance of fostering a supportive online environment and lays the groundwork for future research in automated social support detection.
- South America (0.04)
- North America > Mexico > Mexico City > Mexico City (0.04)
- North America > Central America (0.04)
- (2 more...)
- Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.46)
- Law > Civil Rights & Constitutional Law (0.46)
Enhanced Rooftop Solar Panel Detection by Efficiently Aggregating Local Features
Kurte, Kuldeep, Kulkarni, Kedar
In this paper, we present an enhanced Convolutional Neural Network (CNN)-based rooftop solar photovoltaic (PV) panel detection approach using satellite images. We propose to use pre-trained CNN-based model to extract the local convolutional features of rooftops. These local features are then combined using the Vectors of Locally Aggregated Descriptors (VLAD) technique to obtain rooftop-level global features, which are then used to train traditional Machine Learning (ML) models to identify rooftop images that do and do not contain PV panels. On the dataset used in this study, the proposed approach achieved rooftop-PV classification scores exceeding the predefined threshold of 0.9 across all three cities for each of the feature extractor networks evaluated. Moreover, we propose a 3-phase approach to enable efficient utilization of the previously trained models on a new city or region with limited labelled data. We illustrate the effectiveness of this 3-phase approach for multi-city rooftop-PV detection task.
- Europe > Netherlands (0.04)
- Asia > India > Maharashtra (0.04)
Depression detection in social media posts using transformer-based models and auxiliary features
Kerasiotis, Marios, Ilias, Loukas, Askounis, Dimitris
The detection of depression in social media posts is crucial due to the increasing prevalence of mental health issues. Traditional machine learning algorithms often fail to capture intricate textual patterns, limiting their effectiveness in identifying depression. Existing studies have explored various approaches to this problem but often fall short in terms of accuracy and robustness. To address these limitations, this research proposes a neural network architecture leveraging transformer-based models combined with metadata and linguistic markers. The study employs DistilBERT, extracting information from the last four layers of the transformer, applying learned weights, and averaging them to create a rich representation of the input text. This representation, augmented by metadata and linguistic markers, enhances the model's comprehension of each post. Dropout layers prevent overfitting, and a Multilayer Perceptron (MLP) is used for final classification. Data augmentation techniques, inspired by the Easy Data Augmentation (EDA) methods, are also employed to improve model performance. Using BERT, random insertion and substitution of phrases generate additional training data, focusing on balancing the dataset by augmenting underrepresented classes. The proposed model achieves weighted Precision, Recall, and F1-scores of 84.26%, 84.18%, and 84.15%, respectively. The augmentation techniques significantly enhance model performance, increasing the weighted F1-score from 72.59% to 84.15%.
- North America > United States > Texas > Travis County > Austin (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- Europe > Spain > Aragón (0.04)
- (6 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.68)
- Research Report > Promising Solution (0.68)
- Information Technology > Services (1.00)
- Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (1.00)
- Health & Medicine > Consumer Health (1.00)
Evaluating Algorithmic Bias in Models for Predicting Academic Performance of Filipino Students
Švábenský, Valdemar, Verger, Mélina, Rodrigo, Maria Mercedes T., Monterozo, Clarence James G., Baker, Ryan S., Saavedra, Miguel Zenon Nicanor Lerias, Lallé, Sébastien, Shimada, Atsushi
Algorithmic bias is a major issue in machine learning models in educational contexts. However, it has not yet been studied thoroughly in Asian learning contexts, and only limited work has considered algorithmic bias based on regional (sub-national) background. As a step towards addressing this gap, this paper examines the population of 5,986 students at a large university in the Philippines, investigating algorithmic bias based on students' regional background. The university used the Canvas learning management system (LMS) in its online courses across a broad range of domains. Over the period of three semesters, we collected 48.7 million log records of the students' activity in Canvas. We used these logs to train binary classification models that predict student grades from the LMS activity. The best-performing model reached AUC of 0.75 and weighted F1-score of 0.79. Subsequently, we examined the data for bias based on students' region. Evaluation using three metrics: AUC, weighted F1-score, and MADD showed consistent results across all demographic groups. Thus, no unfairness was observed against a particular student group in the grade predictions.
- Asia > Philippines > Luzon > National Capital Region > City of Manila (0.05)
- Asia > India > Karnataka > Bengaluru (0.05)
- Asia > Japan > Kyūshū & Okinawa > Kyūshū (0.04)
- (8 more...)
- Research Report > New Finding (0.46)
- Research Report > Experimental Study (0.46)
- Instructional Material > Course Syllabus & Notes (0.46)
- Education > Educational Setting > Online (1.00)
- Education > Educational Technology > Educational Software > Computer Based Training (0.48)
Exploring the Capability of ChatGPT to Reproduce Human Labels for Social Computing Tasks (Extended Version)
Zhu, Yiming, Zhang, Peixian, Haq, Ehsan-Ul, Hui, Pan, Tyson, Gareth
Harnessing the potential of large language models (LLMs) like ChatGPT can help address social challenges through inclusive, ethical, and sustainable means. In this paper, we investigate the extent to which ChatGPT can annotate data for social computing tasks, aiming to reduce the complexity and cost of undertaking web research. To evaluate ChatGPT's potential, we re-annotate seven datasets using ChatGPT, covering topics related to pressing social issues like COVID-19 misinformation, social bot deception, cyberbully, clickbait news, and the Russo-Ukrainian War. Our findings demonstrate that ChatGPT exhibits promise in handling these data annotation tasks, albeit with some challenges. Across the seven datasets, ChatGPT achieves an average annotation F1-score of 72.00%. Its performance excels in clickbait news annotation, correctly labeling 89.66% of the data. However, we also observe significant variations in performance across individual labels. Our study reveals predictable patterns in ChatGPT's annotation performance. Thus, we propose GPT-Rater, a tool to predict if ChatGPT can correctly label data for a given annotation task. Researchers can use this to identify where ChatGPT might be suitable for their annotation requirements. We show that GPT-Rater effectively predicts ChatGPT's performance. It performs best on a clickbait headlines dataset by achieving an average F1-score of 95.00%. We believe that this research opens new avenues for analysis and can reduce barriers to engaging in social computing research.
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.94)
Knowledge of Pretrained Language Models on Surface Information of Tokens
Hiraoka, Tatsuya, Okazaki, Naoaki
Do pretrained language models have knowledge regarding the surface information of tokens? We examined the surface information stored in word or subword embeddings acquired by pretrained language models from the perspectives of token length, substrings, and token constitution. Additionally, we evaluated the ability of models to generate knowledge regarding token surfaces. We focused on 12 pretrained language models that were mainly trained on English and Japanese corpora. Experimental results demonstrate that pretrained Figure 1: Input and output examples when asking GPT-language models have knowledge regarding token 3.5 Turbo about the surface information of words (as length and substrings but not token constitution. of 1st, Jan. 2024). The Japanese example has the same Additionally, the results imply that there meaning as the English text, asking the length of and is a bottleneck on the decoder side in terms of third character in 人類学者 (anthropologist).
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > Canada > Ontario > Toronto (0.04)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
- (4 more...)
Performance of the Pre-Trained Large Language Model GPT-4 on Automated Short Answer Grading
Automated Short Answer Grading (ASAG) has been an active area of machine-learning research for over a decade. It promises to let educators grade and give feedback on free-form responses in large-enrollment courses in spite of limited availability of human graders. Over the years, carefully trained models have achieved increasingly higher levels of performance. More recently, pre-trained Large Language Models (LLMs) emerged as a commodity, and an intriguing question is how a general-purpose tool without additional training compares to specialized models. We studied the performance of GPT-4 on the standard benchmark 2-way and 3-way datasets SciEntsBank and Beetle, where in addition to the standard task of grading the alignment of the student answer with a reference answer, we also investigated withholding the reference answer. We found that overall, the performance of the pre-trained general-purpose GPT-4 LLM is comparable to hand-engineered models, but worse than pre-trained LLMs that had specialized training.
- Europe > Switzerland > Zürich > Zürich (0.14)
- North America > United States > District of Columbia > Washington (0.04)
- Asia > Middle East > Jordan (0.04)
- Education > Educational Setting (0.70)
- Education > Educational Technology (0.47)
- Information Technology > Security & Privacy (0.46)
- Education > Assessment & Standards > Student Performance (0.38)